{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0e3e4245",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/query_engine/sub_question_query_engine.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "954e61ed-b3b3-4445-af90-19046f6c1da2",
   "metadata": {},
   "source": [
    "# Sub Question Query Engine\n",
    "In this tutorial, we showcase how to use a **sub question query engine** to tackle the problem of answering a complex query using multiple data sources.  \n",
    "It first breaks down the complex query into sub questions for each relevant data source,\n",
    "then gather all the intermediate reponses and synthesizes a final response."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75ac0675-cde6-49ae-bfb3-4c43e6c4a718",
   "metadata": {},
   "source": [
    "### Preparation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a005a37e",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2b740633",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4d97d35f",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\"\n",
    "\n",
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c3675b87-4d08-4f59-a971-48bf67c66ad0",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import VectorStoreIndex, SimpleDirectoryReader\n",
    "from llama_index.core.tools import QueryEngineTool, ToolMetadata\n",
    "from llama_index.core.query_engine import SubQuestionQueryEngine\n",
    "from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler\n",
    "from llama_index.core import Settings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2b95fb5b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Using the LlamaDebugHandler to print the trace of the sub questions\n",
    "# captured by the SUB_QUESTION callback event type\n",
    "llama_debug = LlamaDebugHandler(print_trace_on_end=True)\n",
    "callback_manager = CallbackManager([llama_debug])\n",
    "\n",
    "Settings.callback_manager = callback_manager"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f07aabc9",
   "metadata": {},
   "source": [
    "### Download Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c9cfa51",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Will not apply HSTS. The HSTS database must be a regular and non-world-writable file.\n",
      "ERROR: could not open HSTS store at '/home/loganm/.wget-hsts'. HSTS will be disabled.\n",
      "--2024-01-28 11:27:04--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt\n",
      "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...\n",
      "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 75042 (73K) [text/plain]\n",
      "Saving to: ‘data/paul_graham/paul_graham_essay.txt’\n",
      "\n",
      "data/paul_graham/pa 100%[===================>]  73.28K  --.-KB/s    in 0.04s   \n",
      "\n",
      "2024-01-28 11:27:05 (1.73 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22b8b519-dd95-4dfb-9cdc-cb0004852036",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**********\n",
      "Trace: index_construction\n",
      "    |_CBEventType.NODE_PARSING ->  0.112481 seconds\n",
      "      |_CBEventType.CHUNKING ->  0.105627 seconds\n",
      "    |_CBEventType.EMBEDDING ->  0.959998 seconds\n",
      "**********\n"
     ]
    }
   ],
   "source": [
    "# load data\n",
    "pg_essay = SimpleDirectoryReader(input_dir=\"./data/paul_graham/\").load_data()\n",
    "\n",
    "# build index and query engine\n",
    "vector_query_engine = VectorStoreIndex.from_documents(\n",
    "    pg_essay,\n",
    "    use_async=True,\n",
    ").as_query_engine()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "484cb516-6692-4d07-9bd8-93941909b459",
   "metadata": {},
   "source": [
    "### Setup sub question query engine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dbff0cbd-9a1a-4acc-9870-03d05640125c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# setup base query engine as tool\n",
    "query_engine_tools = [\n",
    "    QueryEngineTool(\n",
    "        query_engine=vector_query_engine,\n",
    "        metadata=ToolMetadata(\n",
    "            name=\"pg_essay\",\n",
    "            description=\"Paul Graham essay on What I Worked On\",\n",
    "        ),\n",
    "    ),\n",
    "]\n",
    "\n",
    "query_engine = SubQuestionQueryEngine.from_defaults(\n",
    "    query_engine_tools=query_engine_tools,\n",
    "    use_async=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eeea8e15-78ab-4f72-a380-1a76bb5d5737",
   "metadata": {},
   "source": [
    "### Run queries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a576ce5e-a6d1-470d-be51-84fbb31b4aa2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Generated 3 sub questions.\n",
      "\u001b[1;3;38;2;237;90;200m[pg_essay] Q: What did Paul Graham work on before YC?\n",
      "\u001b[0m\u001b[1;3;38;2;90;149;237m[pg_essay] Q: What did Paul Graham work on during YC?\n",
      "\u001b[0m\u001b[1;3;38;2;11;159;203m[pg_essay] Q: What did Paul Graham work on after YC?\n",
      "\u001b[0m\u001b[1;3;38;2;11;159;203m[pg_essay] A: After YC, Paul Graham worked on starting his own investment firm with Jessica.\n",
      "\u001b[0m\u001b[1;3;38;2;90;149;237m[pg_essay] A: During his time at YC, Paul Graham worked on various projects. He wrote all of YC's internal software in Arc and also worked on Hacker News (HN), which was a news aggregator initially meant for startup founders but later changed to engage intellectual curiosity. Additionally, he wrote essays and worked on helping the startups in the YC program with their problems.\n",
      "\u001b[0m\u001b[1;3;38;2;237;90;200m[pg_essay] A: Paul Graham worked on writing essays and working on YC before YC.\n",
      "\u001b[0m**********\n",
      "Trace: query\n",
      "    |_CBEventType.QUERY ->  66.492657 seconds\n",
      "      |_CBEventType.LLM ->  2.226621 seconds\n",
      "      |_CBEventType.SUB_QUESTION ->  62.387177 seconds\n",
      "        |_CBEventType.QUERY ->  62.386864 seconds\n",
      "          |_CBEventType.RETRIEVE ->  0.271039 seconds\n",
      "            |_CBEventType.EMBEDDING ->  0.269134 seconds\n",
      "          |_CBEventType.SYNTHESIZE ->  62.115674 seconds\n",
      "            |_CBEventType.TEMPLATING ->  2.8e-05 seconds\n",
      "            |_CBEventType.LLM ->  62.108522 seconds\n",
      "      |_CBEventType.SUB_QUESTION ->  2.421552 seconds\n",
      "        |_CBEventType.QUERY ->  2.421303 seconds\n",
      "          |_CBEventType.RETRIEVE ->  0.227773 seconds\n",
      "            |_CBEventType.EMBEDDING ->  0.224198 seconds\n",
      "          |_CBEventType.SYNTHESIZE ->  2.193355 seconds\n",
      "            |_CBEventType.TEMPLATING ->  4.2e-05 seconds\n",
      "            |_CBEventType.LLM ->  2.183101 seconds\n",
      "      |_CBEventType.SUB_QUESTION ->  1.530997 seconds\n",
      "        |_CBEventType.QUERY ->  1.530781 seconds\n",
      "          |_CBEventType.RETRIEVE ->  0.25523 seconds\n",
      "            |_CBEventType.EMBEDDING ->  0.252898 seconds\n",
      "          |_CBEventType.SYNTHESIZE ->  1.275401 seconds\n",
      "            |_CBEventType.TEMPLATING ->  3.2e-05 seconds\n",
      "            |_CBEventType.LLM ->  1.26685 seconds\n",
      "      |_CBEventType.SYNTHESIZE ->  1.877223 seconds\n",
      "        |_CBEventType.TEMPLATING ->  1.6e-05 seconds\n",
      "        |_CBEventType.LLM ->  1.875031 seconds\n",
      "**********\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\n",
    "    \"How was Paul Grahams life different before, during, and after YC?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "222d16a7-089a-4de0-b551-fcd785f3eb26",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Paul Graham's life was different before, during, and after YC. Before YC, he focused on writing essays and working on YC. During his time at YC, he worked on various projects, including writing software, developing Hacker News, and providing support to startups in the YC program. After YC, he started his own investment firm with Jessica. These different phases in his life involved different areas of focus and responsibilities.\n"
     ]
    }
   ],
   "source": [
    "print(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fc38e65a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sub Question 0: What did Paul Graham work on before YC?\n",
      "Answer: Paul Graham worked on writing essays and working on YC before YC.\n",
      "====================================\n",
      "Sub Question 1: What did Paul Graham work on during YC?\n",
      "Answer: During his time at YC, Paul Graham worked on various projects. He wrote all of YC's internal software in Arc and also worked on Hacker News (HN), which was a news aggregator initially meant for startup founders but later changed to engage intellectual curiosity. Additionally, he wrote essays and worked on helping the startups in the YC program with their problems.\n",
      "====================================\n",
      "Sub Question 2: What did Paul Graham work on after YC?\n",
      "Answer: After YC, Paul Graham worked on starting his own investment firm with Jessica.\n",
      "====================================\n"
     ]
    }
   ],
   "source": [
    "# iterate through sub_question items captured in SUB_QUESTION event\n",
    "from llama_index.core.callbacks import CBEventType, EventPayload\n",
    "\n",
    "for i, (start_event, end_event) in enumerate(\n",
    "    llama_debug.get_event_pairs(CBEventType.SUB_QUESTION)\n",
    "):\n",
    "    qa_pair = end_event.payload[EventPayload.SUB_QUESTION]\n",
    "    print(\"Sub Question \" + str(i) + \": \" + qa_pair.sub_q.sub_question.strip())\n",
    "    print(\"Answer: \" + qa_pair.answer.strip())\n",
    "    print(\"====================================\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
